Pericentromeric recombination suppression and the ‘large X effect’ in plants

X chromosome was reported to be a major contributor to isolation between closely related species—the ‘large X’ effect (LXE). The causes of LXE are not clear, but the leading theory is that it is caused by recessive species incompatibilities exposed in the phenotype due to the hemizygosity of X-linked genes in the heterogametic sex. However, the LXE was also reported in species with relatively recently evolved sex chromosomes where Y chromosome is not completely degenerate and X-linked genes are not hemizygous, such as the plant Silene latifolia. Recent genome sequencing and detailed genetic mapping in this species revealed a massive (> 330 Mb) non- or rarely-recombining pericentromeric region on the X chromosome (Xpr) that comprises ~ 90% of the chromosome and over 13% of the entire genome. If any of the Xpr genes are involved in species incompatibilities, this would oppose interspecific gene flow for other genes tightly linked in the Xpr. Here we test the hypothesis that the previously reported LXE in S. latifolia is caused by the lack of recombination on most of the X chromosome. Based on genome-wide analysis of DNA polymorphism and gene expression in S. latifolia and its close cross-compatible relative S. dioica, we report that the rarely-recombining regions represent a significant barrier for interspecific gene flow. We found little evidence for any additional factors contributing to the LXE, suggesting that extensive pericentromeric recombination suppression on the X-chromosome is the major if not the only cause of the LXE in S. latifolia and S. dioica.

Hybridisation and gene flow between closely related species is common and evolution of reproductive barriers is the crucial step in speciation process.In animals, sex chromosomes are known to play a disproportionately large role in isolation between incipient species (e.g., [1][2][3] ).Interspecific hybridisation often leads to asymmetric outcome, with hybrid inviability and sterility usually occurring in the heterogametic sex-the observation that is often called Haldane's rule (HR), indicating that sex chromosomes play a major role in speciation 4,5 .The X-chromosome was also proposed to have disproportionately large role in dysfunction of hybrids in comparison to their autosomal counterparts, known as the large X effect (LXE) [6][7][8] .The LXE and HR are often referred to as the "two rules of speciation".
The LXE and HR, are thought to be caused by recessive species incompatibilities exposed in the phenotype due to the hemizygosity of X-linked genes in the heterogametic sex 7,9 .Thus, the reports of HR and the LXE in species with recently evolved non or partially-degenerate Y-chromosomes 10 , such as Silene latifolia and its relatives discussed below 11,12 , were surprising and cast doubts that hemizygous X-linked genes are the universal major cause of HR and LXE.Other possible causes of HR and LXE include meiotic drive on sex chromosomes [13][14][15] , misregulation of X-chromosome in hybrids 1,3 , quicker evolution of genes linked to X-chromosome (faster-X theory; e.g., [16][17][18] ), higher density of male sterility loci on X chromosomes than autosomes 1 , quicker evolution of spermatogenesis-related genes and stronger sexual selection exerted on males than females (faster males theory; 19 ).
Here we analyse another possible cause of the LXE-the presence of a massive block of rarely-or non-recombining DNA on the X-chromosome, as recently reported for S. latifolia 20 .Extensive pericentromeric recombination suppression (PRS) on very large (~ 400Mb) S. latifolia X-chromosome appears to be an extreme case of a general tendency for long chromosomes to have large central chromosomal regions with rare recombination.The reasons for this are not clear, but they are discussed in the literature 21,22 .Regions of low recombination often show high genetic differentiation between species because stronger and wider linkage disequilibrium (LD) in such regions increases linkage of loci involved in interspecific incompatibility (barrier loci) with a larger chunk of the genome, which leads to suppressed introgression in such regions (e.g., [23][24][25][26][27][28][29] ).Non-recombining regions could contribute to the maintenance of species integrity despite on-going interspecific hybridisation, as noted

Transcriptome dataset
The analyses in this study are based on sequence data from 12 S. latifolia and 12 S. dioica females (Table 1) grown in the glasshouse (20 °C, 15-h lighting) from seeds collected in the wild.Actively growing shoots with flower buds were used for total RNA extraction with a Qiagen RNeasy Plant Mini Kit with on-column DNase digestion.Isolation of mRNA, cDNA synthesis and high-throughput sequencing were conducted according to the standard Illumina RNA-Seq procedure at the WTCHG genomics facility (Oxford, UK).The resulting sequence reads were mapped to female reference transcriptome 46 that was also used in the genetic mapping 50,52 .Read mapping was done with BWA mem 0.7.17 53 and sorted with Samtools 1.7 54 .Then, SNP calling was done with Samtools mpileup (options: -d 1000 -q 20 -Q 20) and sites filtered with bcftools filter 1.7.The resulting multisample vcf file was converted to fasta alignments using ProSeq software 55 available from https:// sourc eforge.net/ proje cts/ proseq/.The latter software was also used for further processing and analysis of resulting datasets.Gene expression was quantified as per-gene FPKM (fragments per kilobase per million reads), calculated with RSEM 56 .
The following groups of genes were used in the analyses: rarely-recombining autosomal genes (rareA); rarelyrecombining X-linked genes in the Xpr region (rareX); frequently-recombining autosomal genes (freqA); frequently-recombining X-linked genes in the qXdr region (freqX).These groups were defined according to the location of a gene in the rarely recombining central chromosome region or actively recombining ends of the chromosomes, based on the S. latifolia female genome sequence 20 .Pericentromeric recombination suppression is extensive on all S. latifolia chromosomes 20,50 and genetic analysis detected no recombination in the central regions of the chromosomes, while recombination at the ends of the chromosomes was frequent 50 .As the transition between the frequently recombining ends of the chromosomes and rarely-recombining central regions is quite sharp 20 , we reasoned the split of the genes in the freqA, rareA, freqX and rareX categories is well justified.

Genomewide polymorphism statistics and comparisons between gene categories
Five polymorphism indices, namely nucleotide diversity (π) 57  , were measured using ProSeq 55 for all sites, fourfold degenerate sites and the first two codon positions.The fourfold degenerate sites are considered the most neutral type of sites in the genome (e.g.Fig. 2 in reference 60 ), while the first two codon positions are likely least neutral.All the above statistics were firstly plotted against genomic positions (using the R package ggplot2; 61 ) to obtain a genome wide overview (fourfold degenerate π and Tajima's D; all sites for F ST , D xy and Z nS ).Then, their values were compared based on the following categories using the Kruskal-Wallis test and Wilcoxon rank-sum test: (1) between frequently-recombining and rarely-recombining groups of genes analysed separately within each species (for Tajima's D and π; using fourfold degenerate sites, or first two codon positions; and for Z nS using all sites) or between the two species (for D xy and F ST ; using all sites); (2) among frequently-recombining and rarely-recombining genes in autosomes and X chromosome (from the frequentlyrecombining qXdr region and rarely-recombining Xpr region), respectively.These statistics were also estimated for each chromosome using all sites, fourfold degenerate sites, and first two codon positions, respectively.To correct for ploidy difference in comparisons between the X-linked and the autosomal genes the estimates of π in autosomes were adjusted to 75% of the original values.Both adjusted and original values are reported here and whenever this correction is used, it is explicitly stated in the text.

Demographic modelling
To quantify and compare the extent of gene flow in rarely-and frequently-recombining genes in the two Silene species, we used five demographic models (from 62 ) that utilise Poisson random field-based demography inference framework implemented in dadi package 63 .These models include (Fig. 1): split_mig-population split with bi-directional migration and constant population size; IM-population split (isolation) with bi-directional migration equal in two directions and population size change; IM_2M-IM with bi-directional heterogeneous migration that is allowed to differ between two classes of sites across the genome; IM2-IM with migration allowed to differ in two directions; IM2_2M-IM2 with heterogeneous migration for two classes of sites (Fig. 1).The models with heterogeneous migration (IM_2M and IM2_2M) include two categories of genomic sites with different migration parameters.These models were chosen to test whether the two species had experienced significant population size change since divergence, whether gene flow differed in each direction and whether there was heterogeneous gene flow (presence of this would potentially mean significant differences in gene flow between autosomes and the X chromosome in each recombination category).Heterogeneous gene flow was tested using two sets of nested models -IM versus IM_2M, and IM2 versus IM2_2M.The fit of these models to data was compared with likelihood ratio tests (LRT).All these models were run for frequently-recombining and rarely-recombining genes separately.Additionally, models IM2 and IM2_2M were run for the following groups of genes: rarely-recombining autosomal genes (rareA); rarely-recombining X-linked genes in the Xpr region (rareX); frequently-recombining autosomal genes (freqA); frequently-recombining X-linked genes in the qXdr region (freqX).Ten initial runs were performed for each model with a wide parameter range (0-5 for time parameters, 0-10 for migration parameters, 0-100 for population size parameters).Based on estimated parameter values in these initial runs, parameter ranges were adjusted for a further 30 runs.The best-fitting model (the run with the highest estimated likelihood) was selected based on Akaike Information Criterion (AIC).Robustness of parameter estimates of the best-fitting models was evaluated with 100 bootstrap runs, with the confidence intervals calculated as M ± 1.96X (where M is the likelihood parameter estimate and X is the standard deviation of parameter estimates from the bootstrap runs).

Significant differences in polymorphism statistics between frequently and rarely-recombining genes
The distribution of genetic diversity was similar in S. latifolia and S. dioica genomes (Spearman's correlation for π (fourfold degenerate sites): R = 0.99, p-value < 2.2 × 10 -16 ; Spearman's correlation for π (first two codon positions): R = 0.99, p-value < 2.2 × 10 -16 ; Spearman's correlation for Tajima's D (fourfold degenerate sites): R = 0.30, p-value < 2.2 × 10 -16 ; Spearman's correlation for Tajima's D (first two codon positions): R = 0.22, p-value < 2.2 × 10 -16 ).Genetic diversity varied considerably across both genomes, with the highest diversity observed at the ends of the chromosomes and much lower diversity in the central regions (Fig. 2).This corresponds to the distribution of recombination rate reported for S. latifolia genome, with extensive pericentromeric recombination suppression present on all chromosomes and frequent recombination occurring only near the ends of all chromosomes 20,50 .Consistent with this, the extent of linkage disequilibrium, quantified with Z nS statistic 59 , was higher in the central regions of the chromosomes compared to actively recombining ends of the chromosomes (Fig. 2).Below we analyse and compare the patterns of DNA polymorphism separately for frequently and rarelyrecombining regions.The genes lying in the central chromosomal regions, where no recombination was detected in genetic cross data 20,50 are designated as "rarely" or "low"-recombining, while the genes located in the actively recombining ends of the chromosomes are designated as "frequently" or "high"-recombining, with similar numbers of genes analysed in these categories (2161 and 2261, respectively).At fourfold degenerate sites, median π was 0.0324 and 0.0341 in the frequently recombining autosomal genes (freqA); and 0.0107 and 0.0118 in the rarely recombining autosomal genes (rareA) in S. latifolia and S. dioica, respectively (Supp.Table 2).Median π at fourfold degenerate sites on the X chromosome was 0.0227 and 0.0209 in the frequently recombing genes (freqX); 0.0029 and 0.0048 in the rarely recombining genes (rareX) in S. latifolia and S. dioica, respectively (Supp.Table 2).In the first two codon positions, median π was 0.0053 in the frequently recombining genes for both species; and 0.0023 and 0.0026 in the rarely recombining genes in S. latifolia and S. dioica, respectively (Supp.Table 3).Median π in the first two codon positions in freqX genes were 0.0043 and 0.0039; and 0.0011 and 0.0016 in rareX genes of the two species (Supp.Table 3).Median F ST (all sites) was the highest in both freqX and rareX genes (Supp.Table 1).Median D xy (all sites) was the second lowest and lowest in freqX and rareX genes, respectively (Supp.Table 1).Median Z nS of freqX genes was highest for both species, but that of rareX genes was highest only for S. dioica (Supp.Table 1).
The frequently and rarely-recombining regions differed in the level and patterns of DNA polymorphism, with π (fourfold degenerate sites and first two codon positions), Tajima's D (fourfold degenerate sites and first two codon positions), D xy (all sites), F ST (all sites), Z nS (all sites) all showed significant differences between these regions (Fig. 3).π, D xy and F ST were also significantly different for all pairwise comparisons in autosomal and X-linked genes in the two recombination categories (Fig. 3, Supp.Fig. 2).However, after autosomal genes' π had been adjusted for ploidy difference with X (by multiplying each value by 0.75), the same pattern remained significant only for fourfold degenerate sites in S. dioica (Fig. 3a, Supp.Fig. 1).RareX genes had significantly different adjusted π from all other groups in fourfold degenerate sites for both species, and first two codon positions in S. latifolia (Fig. 3a, Supp.Fig. 1).In the first two codon positions in S. dioica, rareX genes had significantly different adjusted π from freqA and freqX, but not rareA genes (Fig. 3a, Supp.Fig. 1).For Tajima's D, rareX genes did not differ significantly from rareA and freqX genes in S. latifolia; whereas rareX genes in S. dioica differ significantly from all three other groups (freqA, rareA and freqX) (Fig. 3b).In S. latifolia, Z nS differed significantly between freq (A or X) and rare (A or X) genes but not within recombination categories (between A and X).In S. dioica, the patterns are similar to that of Tajima's D that rareX genes differed significantly with all other groups (Fig. 3b).

Demographic modelling
In order to exclude the effect of pericentromeric recombination suppression on gene flow we conducted separate analyses for rarely-recombing and frequently-recombining regions.For each of these datasets we fitted two pairs of nested models, IM and IM_2M, IM2 and IM2_2M (Fig. 1) that differed in the number of parameters accounting for interspecific gene flow.All these models included population size change after species split, which turns out to be an essential feature of the models, given the model without population size change (split_mig, Fig. 1) www.nature.com/scientificreports/showed much lower fit to data compared to any of the models allowing for population size changes (Table 2).The parameter estimates for population size change showed 2.23 to 2.99-fold population size growth in both species (Table 2), which is consistent with north-ward post-glacial expansion of these species from refugia in southern Europe or Anatolia 64 .IM and IM_2M models assumed that gene flow is the same in both directions, while IM2 and IM2_2M allowed for different migration rates in two directions.Fitting of these models to data revealed that gene flow Table 2. Best parameter estimates of demographic models analysed with dadi, Akaike information criterion (AIC), and results of the likelihood ratio tests (LRT) for nested models (IM versus IM_2M and IM2 versus IM2_2M).The parameters are as in Fig. 1.Confidence levels for parameters of the best-fitting model for each gene category are shown (± 1.96X, where X = standard deviation in 100 bootstrap estimates).The best-fitting model and its parameter values for each category are in bold/bolditalics.www.nature.com/scientificreports/differs significantly in two directions (Table 2), with S. latifolia to S. dioica gene flow (M 1 ) being stronger than in the opposite direction (M 2 ), which is consistent with asymmetric reproductive barrier between these species 65 .
The IM and IM2 models assumed that all sites in the genome had the same gene flow, while the more complex *_2M models allowed for two different classes of sites ("A" and "B") with different migration rates.Better fit of the *_2M models to data (Table 2) demonstrates the presence of significant heterogeneity in interspecific gene flow across the genome.The "A" sites (M A , M A1 and M A2 in Table 2) show much lower migration rate(s) compared to the "B" sites (M B , M B1 and M B2 in Table 2), with ~ 7 to ~ 70-fold difference between the A and B sites (Table 2).Larger proportion of the analysed sites belonged the lower migration A-category for rarely-recombining regions (56%) compared to frequently recombining regions (26%).This is consistent with rarely-recombining regions representing a significant barrier to interspecific gene flow.Analyses using separate autosomal and X-linked genes from rarely-and frequently recombining regions ("rareA", "freqA' , "rareX", "freqA" gene categories), revealed a similar pattern (Table 2)-higher proportions of analysed sites fell into the low migration A-category in the rarely-recombining regions (55% and 50% in "rareA" and "rareX" gene categories, respectively) than in frequently-recombining regions (31% and 28% in "freqA" and "freqX" gene categories, respectively).
The comparison of estimated gene flow for X-linked and autosomal genes reveals that on average (across M A1 , M A2 , M B1 and M B2 in Table 2) for frequently recombining regions, it is about twofold lower on the X compared to autosomes (M Aut /M X = 2.1), which is consistent with significantly higher F ST for freqX compared to freqA (Fig. 3e) as well as with the large-X effect.On the other hand, the rarely-recombining regions show little difference in migration rates between the X-linked and autosomal genes (average M Aut /M X = 1.1), indicating that lack of recombination limits gene flow to a similar extent on the X-chromosome and the autosomes.The estimated time since species divergence (measured in generations times twice the ancestral population size) was similar for all categories except the frequently recombining autosomal genes, where it was much lower (T rareX = 3.85; T freqX = 3.77; T rareA = 4.84; T freqA = 0.55, Table 2).

Gene expression divergence in frequently and rarely-recombining regions
To compare the rate of gene expression divergence on the X chromosome and autosomes, we measured expression in transcriptome sequence data from 12 S. latifolia and 12 S. dioica females (Table 1).As expected for closely related species, gene expression in the two species was strongly positively correlated (Table 3).The correlation was the strongest for the frequently recombining X-linked genes (r 2 = 0.870) and the weakest for the rarely-recombining X-linked genes (r 2 = 0.781), suggesting that gene expression divergence is slightly faster in rarely-compared to frequently recombining X-linked genes (Table 3).However, the proportion of genes that evolved significantly (t-test P < 0.0001) different expression was the same (10%) in these categories.This proportion was the lowest (7.22%) in the frequently recombining autosomal genes, while in all other gene categories it was close to 10% (Table 4).Only the difference between frequently and rarely-recombining autosomal genes was marginally significant (chi 2 = 3.595; P = 0.0580) for the number of genes that evolved significantly different expression in the two species.All other pairwise comparisons were non-significant.Taken together, these results indicate that gene expression divergence between S. latifolia to S. dioica is slowest in the frequently recombining autosomal genes (freqA), possibly due to more active interspecific gene flow homogenising gene pools of these species.Unlike the autosomal frequently recombining genes, expression of the freqX genes is diverging at a similar rate to rarely recombining X-linked genes, which is consistent with the X-linkage acting as a partial barrier to interspecific gene flow.

Discussion
This study analysed the level and patterns of genetic diversity across S. latifolia and S. dioica genomes to assess the contribution of the extensive pericentromeric recombination suppression to limiting gene flow between these species.Lack of recombination leads to linkage disequilibrium of a barrier locus with a wider genomic www.nature.com/scientificreports/region, which leads to suppressed introgression in such regions even for loci that are not causing any hybrid inviability or reduced fertility.Thus, rarely-recombining regions may be major contributors to the maintenance of species integrity despite on-going interspecific hybridisation (e.g., 23,[30][31][32][33][34] ).As S. latifolia and S. dioica regularly hybridise and introgress in overlapping ranges across Europe, rarely-recombining regions, especially that on the X-chromosome, could be key to maintaining their distinct species identities.We conducted analyses of genetic diversity, interspecific divergence and gene flow separately for regions with 'high' and 'low' recombination rates.While this division of the genome into two classes may appear crude, it does reflect strong differences in recombination rate at the ends and central regions of all chromosomes.Pericentromeric recombination suppression is quite extensive on all S. latifolia chromosomes, with the central rarely-recombining region comprising most of the length of all chromosomes 20,50 .This division into a very large (~ 330 Mb) rarely-recombining central region and small frequently recombining regions at the ends is particularly pronounced on the X-chromosome, which is the largest in the female genome 20 .The transition between the frequently recombining ends of the chromosomes and rarely-recombining central regions is quite sharp 20 , and a few genes falling in the transition zones with intermediate recombination rate were excluded from our analysis.Given this distribution of recombination across the S. latifolia genome, the artificial division into 'high' and 'low' (or 'freq' and 'rare') recombination classes reflects biological reality well.
Genetic diversity was observed to be substantially lower in the rarely-recombining central regions of all chromosomes, compared to actively recombining chromosomal ends.Reduced diversity in rarely-recombining regions is a general phenomenon likely caused by linked selection-selective sweeps 66 and background selection 67 that affect wider genomic regions in rarely-recombining regions due to stronger linkage disequilibrium.Selective sweeps are expected to drive allele frequency spectrum towards the excess of low frequency polymorphisms, which is detectable by negative Tajima's D values 68 .This statistic is indeed more negative in rarely-recombining compared to frequently recombining regions (Fig. 3b; Supp.Tables 1, 2, 3).In particular, rareX genes had significantly more negative Tajima's D than both freqA and freqX genes in S. dioica (both fourfold degenerate sites and first two codon positions), and freqA genes in S. latifolia (fourfold degenerate sites only) (Fig. 3b, Supp.Table 2, 3).Genetic diversity in the X-linked genes is lower compared to the autosomes, as expected from their ploidy difference.The lower ploidy for X-linked genes accounts for their lower diversity in frequently recombining X-linked regions (π for freqX and freqA genes were similar after adjustment for the difference in ploidy, except for fourfold degenerate sites in S. dioica), but it is not sufficient to explain reduced diversity in the massive rarely-recombining Xpr region on the X chromosome, compared to rarely-recombining autosomal regions.Even after adjusting π for autosomal genes, rareX genes still had significantly lower π than rareA genes in both species and types of analysed sites, except for those in S. dioica from first two codon positions (Fig. 3a, Supp.Tables 2,  3).This may be due to particularly large size of the Xpr (~ 330Mb) that includes over thousand genes 20 , which should make linked selection that reduces genetic diversity particularly strong.Indeed, linkage disequilibrium (measured with Z nS ) is strongest in rareX genes (Supp.Table 1).
Genetic differentiation between S. latifolia and S. dioica, measured with F ST , is higher in the rarely-recombining central regions of the chromosomes, compared to actively recombining terminal regions with rareX genes having significantly higher F ST than all other groups (Fig. 3e; Supp.Table 1).This is likely caused by reduced gene flow, but the reduced intraspecific genetic diversity in rarely-recombining regions (Figs. 2, 3, Supp.Tables 1,  2, 3) could have also contributed to higher F ST by increasing the relative proportion of overall genetic diversity that is due to species divergence.Lower D xy in the central compared to peripheral regions of the chromosomes (Figs. 2, 3, Supp.Table 1) is also indicative that high F ST in the central rarely-recombining regions is, at least partly, caused by reduced intraspecific genetic diversity.However, the demographic modelling reveals consistently lower estimates of interspecific gene flow in the rarely-recombining compared to frequently recombining regions (Table 2).Time since species divergence estimated for rarely-and frequently recombining X-linked genes is very similar (T rareX = 3.85; T freqX = 3.77; Table 2), which indicates similar coalescent times for the two groups of X-linked genes and suggests that rare recombination and X-linkage both act as considerable interspecific barriers.This is also consistent with much lower T for frequently-(T freqA = 0.55) compared to rarely-(T rareA = 4.84) recombining autosomal regions, with higher interspecific gene flow in the former compared to the latter (Table 2), homogenising gene pools of the two species and reducing T.
While rarely-recombining regions appear to represent significant barriers to interspecific gene flow, X-linkage may also contribute significantly to species differentiation as indicated by higher F ST values in the X-linked compared to autosomal genes for regions with similar recombination rates (i.e.rareX versus rareA, and freqX versus freqA; Fig. 3e).However, D xy is lower in the X-linked compared to autosomal genes both for frequentlyand rarely-recombining regions (Fig. 3d), indicating that higher F ST for X-linked genes is at least partly caused by lower intraspecific genetic diversity on the X-chromosome.Furthermore, the fitting of demographic models to data did not show significantly lower gene flow for X-linked compared to autosomal genes for regions with similar recombination rate (Table 2).The proportion of sites (P) falling into low gene flow category was similar for X-linked and autosomal genes within the same recombination category (P rareX = 0.50 vs P rareA = 0.55; P freqX = 0.28 vs P freqA = 0.31; Table 2).Thus, the effect of X-linkage (if any) on gene flow appears to be much less pronounced compared to reduced recombination rate in pericentromeric regions.

Conclusion
In this study, we tested whether the pericentromeric recombination suppression in the massive Xpr region 20 on the S. latifolia X chromosome can account for the LXE previously reported for this species 12 .While LXE in animals has been shown with direct experiments 69 , the evidence for LXE in S. latifolia 12 and our analyses presented above are indirect-based on evolutionary genetic analyses of genetic diversity and gene flow between the species.We report that population differentiation (F ST ; Fig. 3e) and the proportion of sites with low interspecific gene flow (P in Table 2) are significantly higher in the rarely-recombining compared to the actively recombining regions on the X-chromosome and the autosomes.This reveals an important role of the rarely-recombining regions in limiting gene flow between the two species.As the rarely-recombining region comprises a larger proportion of the X-chromosome (~ 90%) compared to the autosomes (~ 80%) 20 , this likely disproportionately reduces overall interspecific gene flow on the X, contributing to the 'large-X' effect.We found little evidence that X-linkage by itself contributes significantly to the LXE in S. latifolia and S. dioica.The frequently recombining part of the X-chromosome does have a significantly higher F ST compared to the frequently recombining regions on the autosomes (Fig. 3e), but this appears to be caused by lower genetic diversity in the X-linked genes.We conclude that the lack of recombination in pericentromeric regions creates a significant barrier for interspecific gene flow, which is a cause for the LXE in S. latifolia and S. dioica due to a disproportionately large pericentromeric region on the X-chromosome.

Figure 1 .
Figure1.Schematic representation of the five demographic models used in this study.In each diagram, the width of the tree branches at the top shows the current population sizes (N 1 and N 2 ), and moving down (backward in time) the inferred demographic history since the species split.The model that assumes constant population size (split_mig) is represented by straight lines.Models that allow for exponential population size changes since the split (IM, IM_2M, IM2, IM2_2M) have curved lines and include the parameter s, which is the relative size of the population 1 at the split (relative size of population 2 is 1-s).N A is the ancestral population size before the split, and is not a free parameter63 .All population sizes (N 1 and N 2 ) are expressed in units of N A .The time parameter, T, is given in units of 2*N A generations.All migration parameters (M, M 1 , M 2 , M A , M B , M A1 , M A2 , M B1 , M B2 ) are represented by horizontal arrows and expressed in units of 2*N A *m, where m is the proportion of the receiving population consisting of immigrants in each generation.The "A" and "B" indexes for migration parameters reflect migration rate at two classes of sites in the genome in the IM_2M and IM2_2M models.

Table 1 .
Silene samples used in this study.Newly submitted samples are part of the BioProejct PRJNA1012686.

Table 4 .
The numbers and proportions of genes that evolved significantly (t-test, P < 0.0001) different expression between S. latifolia and S. dioica.a X-linked genes in the frequently recombining qXdr region.
bAutosomal genes in frequently recombining regions.c X-linked genes in the rarely-recombining Xpr region.d Autosomal genes in rarely-recombining regions.